Picture for Yuxin Li

Yuxin Li

May

MBench: A Comprehensive Benchmark on Memory Capability for Video World Models

Add code
May 30, 2026
Viaarxiv icon

The WER Trap: Shattering the Illusion of Unified Tokens in Speech Language Models

Add code
May 28, 2026
Viaarxiv icon

Dual Prototype-Conditioned Diffusion Model for Scalable Multi-Class Unsupervised Anomaly Detection in Large Category Spaces

Add code
May 23, 2026
Viaarxiv icon

StepAudio 2.5 Technical Report

Add code
May 22, 2026
Viaarxiv icon

DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action

Add code
May 20, 2026
Viaarxiv icon

Step-Audio-R1.5 Technical Report

Add code
Apr 28, 2026
Viaarxiv icon

The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

Add code
Mar 18, 2026
Viaarxiv icon

DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection

Add code
Jan 01, 2026
Viaarxiv icon

Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4

Add code
Oct 30, 2025
Viaarxiv icon

Step-Audio 2 Technical Report

Add code
Jul 24, 2025
Figure 1 for Step-Audio 2 Technical Report
Figure 2 for Step-Audio 2 Technical Report
Figure 3 for Step-Audio 2 Technical Report
Figure 4 for Step-Audio 2 Technical Report
Viaarxiv icon